A Two-Step Approach for Tree-structured XPath Query Reduction
نویسندگان
چکیده
XML data consists of a very flexible tree-structure which makes it difficult to support the storing and retrieving of XML data. The node numbering scheme is one of the most popular approaches to store XML in relational databases. Together with the node numbering storage scheme, structural joins can be used to efficiently process the hierarchical relationships in XML. However, in order to process a tree-structured XPath query containing several hierarchical relationships and conditional sentences on XML data, many structural joins need to be carried out, which results in a high query execution cost. This paper introduces mechanisms to reduce the XPath queries including branch nodes into a much more efficient form with less numbers of structural joins. A two step approach is proposed. The first step merges duplicate nodes in the tree-structured query and the second step divides the query into sub-queries, shortens the paths and then merges the sub-queries back together. The proposed approach can highly contribute to the efficient execution of XML queries. Experimental results show that the proposed scheme can reduce the query execution cost by up to an order of magnitude of the original execution cost. Keywords—XML, Xpath, tree-structured query, query reduction.
منابع مشابه
Accelerating XPath Evaluation against XML Streams
Data streams are an emerging technology for data dissemination in cases where the data throughput or size make it unfeasible to rely on the conventional approach based on storing the data before processing it. Areas where data streams are applied include monitoring of scientific data (astronomy, meteorology), control data (traffic, logistics, networks), and financial data (bank transactions). Q...
متن کاملQuery reasoning on data trees with counting
Regular path expressions represent the navigation core of the XPath query language for semi-structured data (XML), and it has been characterized as the First Order Logic with Two Variables (FO). Data tests refers to (dis)equality comparisons on data tree models, which are unranked trees with two kinds of labels, propositions from a finite alphabet, and data values from a possibly infinite alpha...
متن کاملI 4 - D 15 b Scalable , Space - optimal Implementation ofWeb Queries
Tree queries and tree data have proved to be interesting limitations on the quest to scalable query evaluation. In this deliverable, we explore two directions on how to extent the results from tree query languages such as XPath beyond tree data: (1) Can we nd an interesting and practically relevant class of (data) graphs that is a proper superset of trees, yet to which algorithms such as twig j...
متن کاملAutomata for Analyzing and Querying Compressed Documents
In a first part of this work, tree/dag automata are defined as extensions of (unranked) tree automata which can run indifferently on trees or dags; they can thus serve as tools for analyzing or querying any semi-structured document, whether or not given in a compressed format. In a second part of the work, we present a method for evaluating positive unary queries, expressed in terms of Core XPa...
متن کاملEnhancing the Tree Awareness of a Relational DBMS: Adding Staircase Join to PostgreSQL
Given a suitable encoding, any relational DBMS is able to answer queries on tree-structured data. However, conventional relational databases are generally not (made) aware of the underlying tree structure and thus fail to make full use of the encoded information. The staircase join is a new join algorithm intended to enhance the tree awareness of a relational DBMS. It was developed to speed up ...
متن کامل